Empowering OLAC Extension using Anusaaraka and Effective text processing using Double Byte coding
نویسنده
چکیده
The paper reviews the hurdles while trying to implement the OLAC extension for Dravidian / Indian languages. The paper further explores the possibilities which could minimise or solve these problems. In this context, the Chinese system of text processing and the anusaaraka system are scrutinised.
منابع مشابه
Using Chinese Text Processing Technique for the Processing of Sanskrit Based Indian Languages: Maximum Resource Utilization and Maximum Compatibility
Chinese text processing systems are using Double Byte Coding , while almost all existing Sanskrit Based Indian Languages have been using Single Byte coding for text processing. Through observation, Chinese Information Processing Technique has already achieved great technical development both in east and west. In contrast, Indian Languages are being processed by computer, more or less, for word ...
متن کاملAn OLAC Extension for Dravidian Languages
OLAC was founded in 2000 for creating online databases of language resources. This paper intends to review the bottom-up distributed character of the project and proposes an extension of the architecture for Dravidian languages. An ontological structure is considered for effective natural language processing (NLP) and its advantages over statistical methods are reviewed
متن کاملAchieving Better Compression Applying Index-based Byte-Pair Transformation before Arithmetic Coding
Arithmetic coding is used in many compression techniques during the entropy encoding stage. Further compression is not possible without changing the data model and increasing redundancy in the data set. To increase the redundancy, we have applied index based byte-pair transformation (BPT-I) as a pre-processing to arithmetic coding. BPT-I transforms most frequent byte-pairs (2-byte integers). He...
متن کاملHigh capacity steganography tool for Arabic text using 'Kashida'
Steganography is the ability to hide secret information in a cover-media such as sound, pictures and text. A new approach is proposed to hide a secret into Arabic text cover media using "Kashida", an Arabic extension character. The proposed approach is an attempt to maximize the use of "Kashida" to hide more information in Arabic text cover-media. To approach this, some algorithms have been des...
متن کاملProcessing Text Files as Is: Pattern Matching over Compressed Texts, Multi-byte Character Texts, and Semi-structured Texts
Techniques in processing text files “as is” are presented, in which given text files are processed without modification. The compressed pattern matching problem, first defined by Amir and Benson (1992), is a good example of the “as-is” principle. Another example is string matching over multi-byte character texts, which is a significant problem common to oriental languages such as Japanese, Kore...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/0909.1147 شماره
صفحات -
تاریخ انتشار 2009